# SigLIP Visual Encoding
Smolvlm 500M Anime Caption V0.2
Apache-2.0
A vision-language model specialized in describing anime-style images, fine-tuned based on SmolVLM-500M-Base
Image-to-Text English
S
Andres77872
17
0
Vit So400m Patch14 Siglip 378.webli
Apache-2.0
A vision Transformer model based on SigLIP, containing only an image encoder, utilizing the original attention pooling mechanism.
Image Classification
Transformers

V
timm
82
0
Llm Jp 3 Vila 14b
A large-scale vision-language model developed by Japan's National Institute of Informatics, supporting Japanese and English with strong image understanding and text generation capabilities.
Image-to-Text Japanese
L
llm-jp
106
10
Featured Recommended AI Models